AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
High-Precision Visual Question Answering

# High-Precision Visual Question Answering

Videorefer 7B Stage2.5
Apache-2.0
VideoRefer-7B is a multimodal model based on a video large language model, focusing on spatio-temporal object understanding tasks.
Text-to-Video Transformers English
V
DAMO-NLP-SG
20
2
Llama 3.2V 11B Cot
Apache-2.0
Llama-3.2V-11B-cot is a visual-language model capable of spontaneous and systematic reasoning, developed based on the LLaVA-CoT framework.
Image-to-Text Transformers English
L
Xkev
5,089
151
Xgen Mm Phi3 Mini Instruct Singleimg R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models developed by Salesforce AI Research. It is improved based on the successful design of the BLIP series, providing more powerful multimodal processing capabilities.
Image-to-Text Safetensors English
X
Salesforce
313
15
Internlm Xcomposer2 Vl 7b
Other
InternLM-XComposer2 is a vision-language large model developed based on InternLM2, featuring outstanding image-text understanding and creation capabilities.
Text-to-Image Transformers
I
internlm
1,902
82
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase